Assessing thermalization and estimating the Hamiltonian with output data only 
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I consider the generic situation where a finite number of identical test systems in varying (possibly 
unknown) initial states are subjected independently to the same unknown process. I show how one 
can infer from the output data alone whether or not the process in question induces thermalization, 
and if so, which constants of the motion characterize the final equilibrium states. In case thermal- 
ization does occur and there is no evidence for constants of the motion other than energy, I further 
show how the same output data can be used to estimate the test systems' effective Hamiltonian. 
For both inference tasks I devise a statistical framework inspired by the generic techniques of factor 
and principal component analysis. I illustrate its use in the simple example of qubits. 

PACS numbers: 05.30.-d, 03.65.Wj, 03.65.Yz, 05.70.Ln 



I. INTRODUCTION 

Controversies over the apparent dichotomy between 
microscopic reversibility and macroscopic irreversibility 
are as old as statistical mechanics itself and continue to 
the present day, as exemplified by the popular Ref. [l[ 
and the ensuing vivid debate [2]. Broadly speaking, the 
issue can be tackled "bottom up" or "top down". The 
bottom up approach, which has been pursued by the 
majority of researchers, involves specifying (or at least 
imposing constraints on) some microscopic Hamiltonian 
and subsequently studying the evolution of those degrees 
of freedom that are deemed "macroscopic" , "accessible" 
or otherwise "relevant" to the problem at hand. This 
line of research has of late enjoyed cross-fertilization with 
topical areas such as nanoscale thermodynamics 0-[l|> 
quantum many- body physics 0413 an d quantum infor- 
mation [13H16j . leading to some powerful new results. 
They confirm that the eventual thermalization of a quan- 
tum system is a universal phenomenon which holds true 
for virtually all Hamiltonians and sensible choices for the 
relevant degrees of freedom [17H20l |. The rather generic 
assumptions that are needed amount to (i) excluding the 
special case of isolated systems with highly regular, com- 
pletely integrable dynamics; and (ii) introducing some 
form of coarse graining, such as limiting the resolution 
of realistic preparation and measurement devices [l7i j or 
tracing out the degrees of freedom of a bath [18| . Coarse 



graining entails that information about the microstate is 
siphoned off from the retained to the discarded degrees 
of freedom. This leakage becomes irreversible whenever 
the dynamics of the latter is sufficiently fast and irreg- 
ular, leading to an effective memory loss on time scales 
much shorter than those pertaining to the evolution of 
the relevant degrees of freedom [2lj |. 

In contrast, the lesser-known top down approach, pi- 
oneered by Jaynes for classical statistical mechanics [22j | 
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and subsequently generalized (23j . refrains from consid- 
ering any specifics of the underlying microscopic dynam- 
ics and instead derives macroscopic irreversibility from 
the very basic requirement - essential to the scientific 
method - that macroscopic experiments be reproducible. 
The central argument is very simple: An experiment is 
reproducible if its initial preparation uniquely determines 
its final outcome; i.e., if merely on the basis of their ini- 
tial values one can predict with certainty the final val- 
ues of the relevant degrees of freedom. Since a predic- 
tion cannot possibly contain more information than the 
data on which it is based, the final values of the rele- 
vant degrees of freedom cannot carry more information 
than do their initial values. So in the course of a repro- 
ducible experiment the amount of missing information 
about the system's microstate, and hence the entropy, 
can only increase, Q.E.D. There are other top down ap- 
proaches which are similar in spirit, yet which rather than 
from "reproducibility" start from different primitives like 
"adiabatic accessibility" [23 |. 

Reversing the top down logic, violations of the second 
law may well occur; but such violations are never repro- 
ducible, and with increasing system size, become exceed- 
ingly unlikely. Experiments that purport to violate the 
second law in a reproducible fashion must presuppose 
the preparation of some special (say, highly correlated) 
initial state, or else some peculiar prior history of the 
system (such as in the classic example of spin echoes 
[25]). The apparent systematic violation of the second 
law then stems from the fact that the experimenter ac- 
tually controls degrees of freedom other than the sup- 
posedly relevant ones, either directly in the present or 
through specific interventions in the past. 

Despite their seemingly different outlooks the bottom 
up and top down approaches both revolve around the piv- 
otal issue of memory loss. They either show (bottom up) 
or simply postulate (top down) that in realistic experi- 
ments the relevant degrees' remote history has no influ- 
ence on their future evolution, and thus can be safely dis- 
regarded. This intimate connection between irreversibil- 
ity and memory loss is captured succinctly in Landauer's 



2 



principle [26| . which has spawned another highly inter- 
esting line of research [27H31J . 

In the present paper I wish to add yet another, and 
rather practical, perspective on the issue of thermaliza- 
tion. When a novel quantum system is fabricated and in- 
vestigated in the laboratory for the first time, its precise 
dynamics and possibly even its constants of the motion 
are not known in advance. (Of course, there is gener- 
ally some theoretical expectation; but whether this will 
be confirmed or refuted by actual measurements is not 
a priori clear.) A particular experiment might then be 
aimed at assessing whether or not a certain process leads 
to thermalization; and if so, which set of thermodynamic 
variables characterizes the final equilibrium state. Op- 
erationally, one might do this by assembling multiple 
samples, each consisting of identically prepared copies 
of the system. Each sample is prepared in a different 
initial state and subjected to the process in question. 
If thermalization does occur, subsequent quantum-state 
tomography [32| on all samples will reveal that, modulo 
random fluctuations, their respective final states are dis- 
tributed on some low-dimensional submanifold of state 
space. This submanifold is composed of states of the 
Gibbs form p oc exp(— A a G ), with the observables 
{G a } being the constants of the motion. Their expec- 
tation values or the associated Lagrange parameters, re- 
spectively, then constitute the appropriate set of thermo- 
dynamic variables. This approach to assessing thermal- 
ization is based on output data only and does not require 
tight control over the initial states of the various samples. 

Yet in a real-world setting, the system in question 
might be difficult to manufacture, and the above ideal- 
ized procedure difficult to execute. Specifically, it might 
only be possible to prepare a small number of samples, 
which in turn are small in size. As a consequence, there 
will be just a few data points in state space, which more- 
over have non-negligible error bars. Reconstructing the 
Gibbs manifold and hence the constants of the motion 
on the basis of such imperfect measurement data then 
becomes a nontrivial statistical inference task. In purely 
statistical terms, this is a situation where noisy data in 
some high-dimensional space (the tomographic images in 
state space) are presumed to be explained by a small 
number of latent variables (the expectation values of the 
constants of the motion), effectively reducing the dimen- 
sionality of the data. In such a generic setting, the task 
is to infer the optimal dimension and orientation of the 
lower-dimensional latent space. Problems of this type can 
be tackled with a variety of statistical techniques such as 
factor analysis or principal component analysis [33l - l40j . 
In the present paper, I shall build on these generic tech- 
niques to develop a statistical framework tailored to the 
relevant task of assessing whether or not thermalization 
has occurred, and if so, inferring the most plausible set 
of constants of the motion. 

Whenever the above statistical analysis suggests that 
thermalization has indeed occurred and there is one con- 
stant of the motion only, this single constant of the mo- 



tion is by default the Hamiltonian. The same statistical 
framework can then be used to estimate that Hamilto- 
nian. This estimation procedure is based on studying 
thermal properties rather than time evolution; and it uses 
only output rather than input-output data. Therefore, it 
is very different in its app roach from the usual quantum- 
process tomography [4l|-|50| and Hamiltonian tomogra- 
phy [5ll455j |. As the second key result of the present pa- 
per, I shall lay out this "thermal" estimation procedure 
for the Hamiltonian and illustrate its use in a simple ex- 
ample. 

The remainder of the paper is organized as follows. In 
Sec. HU I will present the general statistical framework 
for assessing thermalization along with the key approx- 
imations made. In Sec. IIII1 I will turn to the rather 
common case where the Hamiltonian is the sole constant 
of the motion, and explain how in this case one can infer 
the most plausible Hamiltonian from the data. In Sec. 
IIV[ I will put the general framework to use in the simple 
example of qubits, both to assess their thermalization 
and to estimate the pertinent Hamiltonian. Finally, in 
Sec. El I will conclude with a brief discussion. 



II. ASSESSING THERMALIZATION 

Let R denote the number of distinct samples and 
Ni,i = 1 . . . R, the size of the i-th sample. After the 
samples have undergone the process in question they are 
all subjected to quantum-state tomography, which may 
or may not be informationally complete. Let {F } denote 
the set of observables whose totals are ascertained in a to- 
mographic experiment (by performing measurements on 
each member of the sample and adding up the results, or 
via global measurements on the entire sample), and {fl\ 
the associated sample means gleaned from the i-th sam- 
ple. Finally, let the quantum state a denote a possible 
prior bias as to the samples' final state |56| ; in case of 
complete prior ignorance, this is simply taken to be the 
totally mixed state. 

The hypothesis to be tested is whether or not the to- 
tality of experimental data D = {/^} can be explained 
by the expectation values of some smaller set of ob- 
servables {G a }, the presumed constants of the motion. 
Associated with these presumed constants of the mo- 
tion and with the measured observables are subspaces 
Q := span{l,G a } and T := span{l,F(,} of the space of 
observables (with 1 being the unit operator), termed re- 
spectively the "theoretical" and "experimental" level of 
description |57| . For the former to have any explanatory 
value, it must be dim Q < dim J 7 . 

The plausibility of the theoretical hypothesis is en- 
coded in the posterior probability of the level of descrip- 
tion Q, given the data D and prior bias a. By B ayes' rule 
[58j , this probability of interest is given by 

prob(£| A {Ni}, T\ a) oc prob(a)prob(.D|{7V l }, T- a, Q). 

(1) 
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Whenever the prior prob(^) is sufficiently non- 
committal, the right hand side will be dominated by the 
likelihood function. As the various runs of the experi- 
ment are independent, the latter can be factorized: 

R 

prob(D\{Ni},F;*,g) = Y[ P Tob(D i \N h F;a > g), (2) 

i=l 

with the data Di pertaining to the i-th sample. And 
finally, according to the theoretical hypothesis, each in- 
dividual factor can be marginalized: 

prob(Di\Ni,F; tr,Q) 

= / duj prob( Di | JVj, w, J")prob(w | cr,Q), (3) 

where the integration ranges not over the complete state 
space S but over the Gibbs manifold TTg(S) associated 
with the theoretical level of description Q and reference 
state a [57[ • This Gibbs manifold is composed of states 
of the generalized Gibbs form 



u) oc exp 



(4) 



which minimize the relative entropy with respect to a un- 
der given constraints for the expectation values of {G a } 

Mm- 

For reasonably large sample sizes and a near optimal 
measurement setup the first factor (likelihood) in the in- 
tegrand of Eq. © can be approximated with the help of 
the quantum Stein lemma [6lM64| . 



prob( Di \Ni, u, F) oc exp[-N t S{m\\uj)} . 



(5) 



(Else the quantum Stein lemma provides only a lower 
bound.) Here pi G 7r£(<S) has the generalized Gibbs 
form (0| with {G a } replaced by {F b } and reference state 
u) rather than a, and with the Lagrange parameters {A^} 
adjusted such that {F b ) fli — for all b. For both concep- 
tual and practical reasons I shall model the second factor 
(prior) in the integrand as an entropic distribution, too, 



prob(w|cr, Q) oc exp[— a5 l (w||cr)], 



(6) 



with a factor of proportionality that does not depend on 
u> (57j . This ansatz contains an unknown hyperparameter 
a > 0, whose most likely value will be estimated later via 
the evidence procedure. 

I assume that the theoretical level of description is a 
proper subspace of the experimental level, Q C F, so 
that 7r£(5) = tt£(S). The Gibbs manifold 7rg(<S), which 
contains the theoretical models u> and the reference state 
<7, is then a proper submanifold of ^jr(S) which contains 
the tomographic images {pi}- Each tomographic image 
Pi has a unique projection Hi :— 7Tg{fii) on the submani- 
fold 7Tg(S), where irZ is the coarse graining operation that 
maps arbitrary states to Gibbs states on ng(S), thereby 



preserving the expectation values of the relevant observ- 
ables {G a }- Also on the submanifold ng(S), between the 
projection 7Tj and the reference state a, lies the interpo- 
lated state (H3, El 



Pi :oc exp [(1 — Xi) In 7^ + a;. In a] 



(7) 



with Xi := a/ (a + Ni); its Lagrange parameters are the 
weighted average of those of iti and a, with respective 
weights Ni and a. Finally, for both the tomographic 
images {fii} and their projections {7^} one defines re- 
spective center-of-mass states 



\x :oc exp 



^ Wi In pn 



i=l 



7T :oc exp 



^ Wi In TTi 



(8) 



with Wi := Ni/J^j Nj, which lie on tt^(S) and 7Tg(S), re- 
spectively, and which are obtained by taking the weighted 
average over all samples of the respective Lagrange pa- 
rameters. 

For nearby states on the manifold ttjt(S) the relative 
entropy is approximately quadratic in their coordinate 
differentials, 



s( f iy)^(i/2)J2(c- 1 ) ab sf a sf b 



Here C denotes the correlation matrix 



C ab (p) := (6F a ;5F b ) p 



(9) 



(10) 



with 5F b := F — (F b ) p and canonical correlation function 



(X;Y) f 



dvtx{p v Xp l - v Y). 



(11) 



The correlation matrix varies little between p and p! , 
and so to lowest order, can be evaluated in either of the 
two states or in any other state p in their vicinity. In 
the following I shall assume that the tomographic im- 
ages {p,i}, their projections {7^}, as well as their respec- 
tive centers of mass p and tt all lie inside a region in 
which the above quadratic ("Gaussian") approximation 
is warranted, with the correlation matrix evaluated in the 
center of mass p, C — C{p). This presupposes that for 
all samples the presumed constants of the motion take 
values within a sufficiently narrow range. Moreover, I 
shall assume that the sample sizes {Ni} are sufficiently 
large compared to a so that the interpolated states {pi}, 
too, lie inside this region. And finally, I assume that the 
sample sizes are also large enough in absolute terms to 
render the likelihood function ([5]) largely concentrated 
inside the Gaussian region. The reference state tr, on the 
other hand, need not necessarily be inside the Gaussian 
region (Fig. [1]). 

The confinement of all pertinent states (with the excep- 
tion of the reference state) to a Gaussian region entails 
a number of simplifications: Relative entropies become 




FIG. 1. States on the manifold n^(S). Black dots indicate 
the tomographic images {pi} associated with data garnered 
from different samples, and the small black circle their center 
of mass p. The straight lines are the reduced Gibbs mani- 
folds TTg(S) (solid line) and iTg(S) (dashed line), respectively. 
Gray dots or circles denote states on either of these reduced 
Gibbs manifolds. In particular, the gray dots are obtained 
by applying the coarse graining ng or iTg, respectively, to the 
tomographic images. (For simplicity, not all coarse grainings 
are shown.) The state pi is the interpolation j7]) between the 
coarse grained image Hi and the reference state a. All states 
inside the big circle are assumed to be sufficiently close to 
each other to warrant the Gaussian approximation for their 
relative entropies; the only state that might lie outside this 
Gaussian region is the reference state a. The gray concen- 
tric circles around one of the tomographic images indicate an 
exemplary likelihood function Q. It has a width of order 
l/\/^V7, which is assumed to lie inside the Gaussian region. 



approximately symmetric, S(p\\p') ps S(p'\\p); for the 
interpolated states {pi\, it is 

(1 - x i )S(p i \\w i ) + XiS{pi\\a) w Xi(l - Xi)S(iri\\<r) (12) 

up to corrections of order 0((a/Ni) 2 ) that account for 
the possible non-Gaussianity of S , (7ri||cr); the centers of 
mass p and ff coincide approximately with the ordinary 
mixtures 

R R 

p:=}^ WjHi , w:=y] WjWj, (13) 
»=1 i=l 



respectively; and the coarse graining map is approxi- 
mately linear, so tt w irg(p) ■ 

Using these approximations, as well as the (exact) law 
of Pythagoras [66| 

S(ih\\u) = ShaWm) + Sfa\\u) (14) 

for all to € tt% (S) and the (exact) mixing rules 

(1 - Xi)S(uj\\Tri) + XiS(u\\a) 

= (1 - Xi)S(pi\\ni) +XiS(pi\\a) + S(oj\\pi) (15) 

and 

R R 

}^WiS(wi\\a) = }^WiS(-Ki\\w) + Seller), (16) 

i=l i=l 

one obtains the log-likelihood 
Inprob(D|{JVi}, T- a,g) 

i=l 

R R 

-^iNi[S(wi\\n) + S(tt\\<j)} + | ^ln^JVi \17) 

i=l i=l 

with p :— dmiTTg(S) and A := J^lniVi, modulo a 
small correction term that accounts for the possible non- 
Gaussianity of /S^Hcr) and varies only weakly with a, 
and modulo additive constants that do not depend on a, 
a or Q. Since XiNi — (1 — Xi)a < a, the terms in the last 
row of Eq. (|T7|) do not scale with sample size (at fixed a) 
and so become negligible in the regime Ni 3> a. The log- 
likelihood then approaches (again modulo additive con- 
stants that do not depend on a or Q) the asymptotic 
result 

L(G) := J2 NMnW*) - S(H\\it)] - (18) 
»=i 1 

This asymptotic log-likelihood is the central quantity 
which I will use for my subsequent analysis. 

Strictly speaking, one has yet to check that it is con- 
sistent to assume that a stays constant when taking the 
limit Ni — > oo; i.e., that the most likely value of a does 
not itself scale with sample size. In order to determine 
this most likely value, I follow the prescription of the ev- 
idence procedure [5a |. I consider the log-likelihood (fT7|) 
and seek its maximum as a function of a. Setting its 
derivative with respect to a equal to zero yields the ex- 
tremum condition 

R , s 

2(1 - Xi)Ni XiiS^m + S(*\\*)] -JL\=0. 
i=l ^ 1 ' 

(19) 

(This maximum likelihood condition generalizes an ear- 
lier result for experiments on a single sample (56l . |57|.) 
In the asymptotic limit Ni — > oo (at fixed relative en- 
tropies), the maximum likelihood estimates for the {xi} 
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must scale as the inverse sample size; and so indeed, 
a = XiNi/(l — Xi) must not scale with sample size. This 
conclusion about a is robust as long as 

d 2 

-a 2 —\n W ob(D\{N t },F;a,g) » 1. (20) 

In the relevant regime iV; ^> a the left hand side of this 
condition is approximately pR/2, so one has good accu- 
racy whenever the number of samples is sufficiently large, 
i?> 1. 

The asymptotic log-likelihood (fT8|) is the difference of 
two sums, reflecting a trade-off that is typical for model 
selection [67| . The first sum gets bigger as the theoretical 
level of description becomes more detailed and yields a 
better fit with the data; in fact, it is maximal for the 
largest possible level of description, Q = T . The sum 
which is subtracted from this, on the other hand, being 
proportional to the Gibbs manifold dimension, penalizes 
excessive detail; it embodies "Occam's razor" . Therefore, 
finding the most plausible level of description and hence 
the constants of the motion always involves a trade-off 
between goodness of fit and simplicity. 

In case the reference state a is not given a priori 
but is itself a variable to be inferred, one must con- 
sider the asymptotic log-likelihood (|T5)) also as a func- 
tion of <7. The log-likelihood attains its maximum for 
any a G iTg(S); then the relative entropy S f (/l||7f) van- 
ishes. Using such a maximum likelihood estimate for a, 
and assuming further that the dimension p of the Gibbs 
manifold is fixed from the outset, the remaining optimiza- 
tion of (the orientation of) Q reduces to maximizing the 
weighted average of the relative entropies {S(iTg(ni)\\fi)}. 
In the Gaussian regime, this is tantamount to the opti- 
mization task known in statistics as "principal compo- 
nent analysis" (33l - l40l |. 

Now I turn to the general case in which there is an 
arbitrary given reference state, and where both the di- 
mension and the orientation of the explanatory level of 
description are to be inferred. Suppose there are two ri- 
val proposals for the level of description, Q and T~L, where 
the latter is more detailed than the former (and both 
are contained in the experimental level of description), 
Q C H C T . The associated Gibbs manifolds 7rg(<S) and 
7r^(5) have respective manifold dimensions p and p + s. 
As discussed earlier, the choice between the two propos- 
als will involve a trade-off between goodness of fit (favor- 
ing Ti) and simplicity (favoring Q). Using the fact that 
within the Gaussian region the relative entropy of two 
coarse grained states is approximately invariant under a 
change of reference state a — > p, 

S(n\\ir)K S(4(tu)\m (21) 

(and likewise for H), the difference of the asymptotic log- 
likelihoods can be written as 

L{U)-L(Q) 

R sA 



(22) 

If this difference is positive, the more detailed level of 
description H is called for; if it is negative, one better 
stick to the simpler model Q . This criterion extends an 
earlier result obtained in Ref. [57j for experiments on a 
single sample. 

Finding the optimal level of description, and hence the 
most plausible set of constants of the motion, can now 
proceed in two ways: either directly, by maximizing the 
asymptotic log- likelihood (fTSj) as a function of Q\ or indi- 
rectly (and usually more feasible in practice), by formu- 
lating various hypotheses about the level of description 
and then comparing them by means of the difference cri- 
terion (1221) . If the optimal Q is spanned by only one or 
very few observables (aside from the unit operator) , this 
indicates that thermalization has indeed occurred. 

The reconstruction of the appropriate level of descrip- 
tion precedes the reconstruction of the quantum state 
of any individual system. The former requires data from 
the totality of all samples. Once the reconstruction of the 
level of description has succeeded, one may take this level 
as a given and turn to reconstructing the Gibbs state of 
an individual system, based on data from the pertinent 
sample only, by means of well-known state estimation 
techniques [57j . 

III. HAMILTONIAN ESTIMATION 

Whenever the above statistical analysis reveals or it 
is posited from the outset that there is only one con- 
stant of the motion, this is by default the Hamiltonian. 
The Gibbs manifold is then made up of canonical states 
p oc exp(— /3-ff), with Hamiltonian H and inverse temper- 
ature /3. (For a non-uniform reference state there is an 
additional term (lncr— (lner} CT ) in the exponent.) Strictly 
speaking, in case the are not informationally com- 
plete, H is not the full Hamiltonian but the effective 
Hamiltonian pertaining to the measured degrees of free- 
dom. Since the latter usually coincide with the slow de- 
grees of freedom, H is then an effective low-energy Hamil- 
tonian. If the Hamiltonian is not known in advance, it 
must be estimated from the data. In this section, I shall 
lay out the appropriate estimation procedure. 

Let the Hamiltonian be parametrized by some set of 
parameters £ = Then so are the coarse grained 

states 

7r<(0 = Zifr^y 1 exp[(lna - <ln<7) ff ) - (23) 

with arbitrary reference state a, where the partition func- 
tion 

ZG8i,0 :=tr{exp[(l n( 7-<lna} CT )-Aif(0]} (24) 

ensures state normalization, and the inverse temperature 
Pi is adjusted such that (H(^)) w .^ — =: 
Their weighted average 7f(£), equally parametrized by £, 
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has (in the Gaussian approximation) the same canonical 
form, with inverse temperature j3 ~ WiPi and internal 
energy U = J2i w iUi- The asymptotic log- likelihood (fTSj) 
thus becomes a function of £. It attains its maximum 
when 

R 

d 6 J2 w iS(7r i (0\\m)=d 6 S(fi\\n(0). (25) 

i=l 

To evaluate the left hand side of this extremization 
condition, I use Eq. (|21[) and the Gaussian approxima- 
tion to write 

- 1 

^^5(^(0^(0) «^yvar(C/), (26) 

where 

C(fl):=(SH(0;SH(0)f, (27) 
with 5H(£) := ff(0 - [7(0, and 

var(f7) := ^ - f/) 2 . (28) 

i=i 

The latter two functions have the respective derivatives 
dsC(fi)=2(5H(£);5(dtH)) fi (29) 
with 5(d s H) := d ( H - d ( U and 

5 f var(J7) = 2 cov([/, 5 ? C/) (30) 

with covariance 

R 

cov(U,d s U) := ^ Wl (£Z, ~ U){fyUi ~ 9 6 U). (31) 
i=l 

The right hand side of the extremization condition is 
given by 

fl£<S(/i||7f(0) = - <W^))- (32) 

Altogether, this yields the condition 

cov(P,dtU) - C(^)- 1 var([/)( ( 5 J ff(0;5(^i/)) /i 

= 0C(fi)((9eH) li -(d e H) m ). (33) 

One particularly simple ansatz for the Hamiltonian is 
the linear form 

#(0 = -E^> ( 34 ) 

b 

modulo some additive constant. For the implementa- 
tion of this ansatz it will be convenient to adopt a num- 
ber of index conventions in the style of general relativ- 
ity: Identical upper and lower indices are to be summed 
over; the correlation matrix C (Eq. (jTDJ) ) and its in- 
verse C _1 lower or raise indices, respectively, akin to a 



metric tensor [68|; and the scalar product is defined as 
x ■ y := x a y a = C ab x a y b = (C~ 1 ) ab x a y b . Furthermore, I 
define the covariance matrix 

R 

T ab ;=Y,Mfa-fa)(n-h) (35) 
1=1 

with fb :— J2i w ifb' its "expectation value" 

<r}« : (36) 

as well as 

Sfb(0 ~ (F b ) m - f b (37) 

and N := Yli-^i- With these conventions and defini- 
tions the asymptotic log-likelihood (fT8|) for the level of 
description H(0 := span{l,i?(0} reads 

L(H(0) ~ W2)[(r) s - 5/(0 • 6f(0] (A/2). (38) 

If one is still uncertain as to whether the process in 
question has actually led to thermalization, yet can al- 
ready exclude the existence of other constants of the 
motion besides the Hamiltonian, one must compare the 
log-likelihood of H(£) for all values of £ with the log- 
likelihood of J 7 , i.e., the hypothesis that the data do not 
warrant any dimensional reduction at all. The latter log- 
likelihood is given by 

L(F) ~ (JV/2)tr(T) - (A/2) dim ^(5), (39) 

where tr(r) := T®. The process may be considered "ther- 
malizing" with Hamiltonian H{£) iff L{F) < L(H(£)), 
and hence 

[tr(T) - (r) c ] + 5/(0 • Sf(0 « (A/N)[dim7Tjr(S) - 1]. 

The most likely value of £ is determined by the max- 
imum likelihood condition (|33p . which for the linear 
ansatz simplifies to 

&r)£ = p(£ ■ 0<5/(0 (41) 

with matrix 5^T := V — (T)^. In order to estimate /?, I 
consider 

5(ln7r(0) = fe-6F + \na- (lna)^, (42) 

where as before SX := X — (X)^. In the typical case of 
a uniform reference state a the latter two terms cancel 
so that 

^•£ = (5(ln7r(0);5(ln7r(0))M- (43) 

The right hand side in turn may be approximated to 
lowest order by 

(5(ln^(0);5(ln7f(0))p « Wln^ln^. (44) 
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IV. EXAMPLE: QUBITS 

In the following, I shall illustrate the general frame- 
work in the simple example of qubits, which is tractable 
analytically. In this example the {Ft,} are the Pauli op- 
erators, and the parameter vector £ may be viewed as 
(parallel to) an effective magnetic field. In the typical 
case of a uniform reference state a the expectation val- 
ues of F in the states Ji and ff (£) are related linearly: 



(45) 



As a result, the maximum likelihood condition (14"TT) be- 
comes 



c - a • /)/• 



(46) 



This condition no longer depends on /?, and moreover, is 
invariant under rescaling of £. Without loss of generality, 
therefore, £ may be taken to be normalized, £ • £ = 1. 

For qubits the covariance matrix T is a 3 x 3 matrix. To 
simplify matters, I assume that it singles out one dom- 
inant direction 7, and is isotropic in the remaining two 
directions: 



r£ = r + ( 7 .£) 7 + r_P£, 



(47) 



where the projector P projects orthogonally (with re- 
spect to the scalar product used here) onto the subspace 
complementary to 7, and T + ,r_ with T + > T_ are the 
respective eigenvalues. The unit vectors 7 and / (the unit 
vector pointing in the direction of /) then constitute the 
two preferred directions in the problem. Symmetry dic- 
tates that the solution of the maximum likelihood con- 
dition (|46p must lie in the subspace spanned by these 
two preferred directions, £ £ span{7, /}. In fact, if 7 is 
aligned with /, the solution is £ = 7 = /. In case 7 and 
/ are not aligned, the solution will generally not coincide 
with either of the two. 

In order to quantify how £ interpolates between 7 and 
/ in case the two are not aligned, I define a further unit 
vector r) :oc Pf, the normalized projection of / onto the 
subspace complementary to 7. To lowest (first) order per- 
turbation theory in (77 • /), i.e., for small misalignments, 
the maximum likelihood condition (|46[) has the solution 



1 



Vf,T^l-0((vff). (48) 



This result illustrates nicely how the maximum likelihood 
algorithm interpolates between alignment with the center 
of mass (77 = /) and alignment with the covariance 
pattern (ji ■ £ = 0). For a perfectly isotropic covariance 
pattern (r + = F_) the parameter vector is aligned with 
/. The more pronounced the anisotropy of the covariance 
pattern (r+ ^> r_) and the smaller the lever of the center 



of mass (/ ■ / small), the more £ tends to be aligned with 
7- 

Inserting the maximum likelihood solution into the for- 
mula (|38p for the log-likelihood yields, to lowest order in 
perturbation theory, 



max 




r+-r_ 



(vf) 2 \-^9) 



The maximum likelihood solution satisfies the thermal- 
ization condition (|40|) if and only if 



A 6 2 A 

Y << iV 



1 



f 



(50) 



where 6 is the tilting angle between 7 and /, sin 9 := rj- f. 

As a simple numerical example, I consider data gleaned 
from multiple qubit samples of identical size Ni — 
20,000, and hence lnN{ f» 10. I assume that the dis- 
tribution of tomographic images has a width which in 
the dominant direction is of comparable magnitude as 
the distance of the center of mass from the origin; more 
specifically, that both are about 1/10 of the radius of the 
Bloch sphere, T + w /•/ w 1/100. In the other directions, 
the standard deviation of the tomographic images is as- 
sumed to be smaller by a factor 10, T_ w T + /100. The 
dominant direction 7 and the orientation / of the center 
of mass are not aligned; rather, they are tilted against 
each other by an angle 8 = ir/W. This raises doubts 
about thermalization, as the canonical curves of a qubit 
are straight lines through the origin of the Bloch sphere. 
May the qubits nevertheless be considered thermalized? 
In fact they may, as the standard deviation and the tilting 
angle still satisfy both thermalization conditions in Eq. 
(|50[) . Their most plausible Hamiltonian, parametrized as 
in the ansatz (|34p . contains an effective magnetic field £ 
which (modulo rescaling) is given by Eq. (|48[) . and which 
in this example approximately bisects the angle between 
7 and /. 

In the above example the preferred axis £ of the Hamil- 
tonian is inferred from the data, rather than given or con- 
jectured from the outset. This distinguishes this example 
from other inference tasks where one weighs the hypothe- 
sis of some a priori fixed axis against the hypothesis that 
no such preferred axis exists, for instance when compar- 
ing Ising and Heisenberg models for an anisotropic ferro- 
magnet on the basis of a single sample [l?} • 



V. CONCLUSIONS 

In this paper I focused not on the theoretical ques- 
tion whether or not some system with a given Hamilto- 
nian ought to thcrmalize, but on the practical question 
whether or not experimental data indicate that a system 
with hitherto unknown dynamics has actually thermal- 
ized. This issue never really arises for systems that are 
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macroscopic. Outside the macroscopic realm, however, 
and with data pertaining to small samples composed of, 
say, a few hundred system copies only, it becomes a non- 
trivial statistical inference task. I have laid out the ap- 
propriate statistical framework for assessing thermaliza- 
tion under such adverse conditions. 

In case the data do support the hypothesis of ther- 
malization, and provided there is no evidence for addi- 
tional constants of the motion, I have shown how the data 
can be used to estimate the system's unknown Hamilto- 
nian. Hamiltonian estimation is increasingly important 
in quantum technology, as it is needed to assess and cer- 
tify the proper functioning of quantum devices. Since my 
estimation scheme is based on studying thermal proper- 
ties rather than time evolution and thus requires output 
data only, it may constitute a viable alternative to con- 
ventional time-based approaches especially in situations 
where initial states or time are difficult to control. 

Aside from its practical relevance, the framework pre- 
sented here is also of interest conceptually. One exam- 
ple is a better understanding of the iterative dynamics 
of thermalization. Whenever a physical system exhibits 
a hierarchy of time scales, thermalization typically oc- 
curs in stages, on successively longer time scales. For 
instance, a dense plasma, initially in the kinetic regime 
far from equilibrium, might quickly equilibrate locally 
and thus enter the hydrodynamic regime, but only much 
later reach global equilibrium [U [63] ■ Associated with 
these various stages are successively smaller levels of de- 
scription; in this particular example, first the Boltzmann 
level of description (all single-particle observables) , then 
the hydrodynamic level of description (local particle, en- 
ergy and momentum density) , and finally the equilibrium 
level of description (total energy and particle number). 
Thermalization is thus accompanied by a sequence of 
level contractions. The framework developed here pro- 
vides the quantitative criterion as to when exactly these 
level contractions are warranted. 

I see four routes for further research. First, it will be 
important to test the mathematical framework developed 
here on real or simulated experimental data. In principle, 
any experiment that probes only tiny samples of matter 
such as an array of atoms or the debris from a single high 
energy collision [7(| will lend itself to such an analysis. 
Processing the data will likely require the use of suitable 



numerical techniques. 

Second, in the present paper I made a number of ide- 
alizing assumptions. For instance, I assumed that the 
only source of experimental error is projection noise due 
to the finiteness of the samples, whereas there is no error 
stemming from inaccuracies of the measurement devices. 
Moreover, I took the tomographic measurement setup to 
be near optimal in the sense of the quantum Stein lemma. 
In case the observables {F^} do not commute, this may 
involve global measurements which are difficult to imple- 
ment in practice. In future work I plan to investigate 
how the mathematical framework must be adapted when 
these assumptions are relaxed. 

Third, on a more conceptual level, I consider it worth- 
while to generalize the mathematical framework in the 
following way. While the approach laid out in the present 
paper aims to infer the most plausible level of description 
in a single step, a different approach might split this into 
two distinct inference tasks: first estimating the optimal 
dimension of the level of description; and then, given 
the dimension, its optimal orientation. In this alterna- 
tive approach the first step involves an additional Occam 
factor, and so in principle, might lead to other conclu- 
sions than the present approach. It will be interesting 
to understand under which circumstances such divergent 
conclusions may arise, and why. 

Finally, also on the conceptual level, the pivotal log- 
likelihood function L(Q) which features in the statisti- 
cal analysis depends on a number of scaling parame- 
ters: the total number R of samples, their sizes {Aj}, 
the Gibbs manifold dimension p, and - when calibrat- 
ing against L(J-) - the number of different measurement 
setups. I propose to investigate in more detail how the 
log-likelihood scales with each of these parameters, and 
whether any general conclusions can be drawn from this 
about the typicality of thermalization in different scaling 
regimes. 
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